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Abstract 

In this paper, we obtain bounds on the probabihty of convergence to the optimal so- 
lution for the compact Genetic Algorithm (cGA) and the Population Based Incremen- 
tal Learning (PBIL). We also give a sufficient condition for convergence of these al- 
gorithms to the optimal solution and compute a range of possible values of the pa- 
rameters of these algorithms for which they converge to the optimal solution with a 
confidence level. 
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1 Introduction 

Although univariate Estimation of Distribution Algorithms (EDAs) have low efficiency 
in solving difficult problems, it is still important to study them for two reasons. First, 
due to their simplicity in terms of memory usage and computational complexity they 
may be quite useful in memory-constrained applications, especially for implementing 
evolvable hardware. Second, it is advised to begin with a simple EDA to develop meth- 
ods needed for the analysis of more complicated EDAs (Droste 2005[ |. Three of the 



simplest univ ariate EDAs (UEDAs) are the cGA(|Harik et al.} |1999b| , the PBIL(Baluja] 



and Caruana}[l995) , and the UMDA (Miihlenbein 1997[ | which is a special case of the 
PBIL. These algorithms initialize a probability vector (PV), in which each component 
of the PV follows a Bernoulli distribution with the parameter of 0.5, thereby randomly 
generating solutions by employing this PV. Some of the generated solutions are se- 
lected based on their fitness values and a selection scheme. Next, the PV is updated 
using learning algorithms. The process of adaptation continues until some criteria are 
satisfied, for example, the PV converges. 

A few people have studied different theoretical aspects of these simple algorithms 
including their convergence and time complexity. The first theoretical study of the con- 
vergence of the PBIL with an arbitrary learning rate in (0,1) is carried out by Hohfeld 
and Rudolph] (|T997|. It is argued that the PBIL converges almost surely to the maxi- 
mum point of linear functions. We will return to this result later m this paper. Having 
a sufficiently small learning rate, Gonzalez et al. < 2000 J model the PBIL using a discrete 
djmamical system and demonstrate that the local optimum of an injective function with 
respect to Hamming distance are stable fixed points of the PBIL. They also study the 
strong dependency of the PBIL on initial values of the PV and the learning rate (IGonzaJ 



lez et al. 2001) . In an interesting paper, Zhang| ( 2004) studies the stability of fixed points 
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of limit models of the UMDA while using two-tournament selection scheme and shows 
that the local optima with respect to Hamming distance are asymptotically stable. In 
( [Rastegar and Meybodi |2005j , the PBIL is studied for the case that the population size 
is sufficiently large and as a result dynamical properties of the algorithm are derived 
for different selection schema. Also, in (Rastegar and Hariri 2006b|a ^, it is proven that 
the PBIL and the cGA with sufficiently small learning rates do not show any cyclic 
or chaotic behavior but instead converge weakly to the local maxima with respect to 
Hamming distance when they optimize an injective function. Time complexity is an- 
other aspect of these algorithms studied by a few researchers. Droste (|2005 1 carries out 
the first rigorous study on the time complexity of the cGA for linear pseudo-boolean 
functions. He shows that not all linear functions have the same asymptotical runtime. 
Chen et al.| ( [200 7| study the time complexity of the PBIL and the UMDA. They extend 



the concept of convergence to convergence time and estimate the upper bound of the 
mean first hitting times of the UMDA and the PBIL on a simple pseudo-modular func- 
tion and analyze the mean first hitting time of the PBIL on a hard problem. The result 
shows that the PBIL may spend exponential time to find the global optimum. 

Another important topic is the effect of initial parameters, such as the initial PV, 
the learning rate, and the population size, on the probability that the cGA and the PBIL 
converge to optimal solutions, called optimal convergence probability, which to the best 
knowledge of the author is not studied deeply. The importance of this topic appears 
when one notice for example when the learning rate is not small enough, it is not likely 
that the cGA converges to a good solution for the problem and therefore, it may appear 
reasonable that to find solutions of high quality, the learning rate be small as much as 
possible. However, if the learning rate is too small, the cGA will waste time processing 
unnecessary individuals, and this may result in unacceptably slow performance. The 
problem is to find a learning rate which is small enough to permit a correct exploration 
of the search space without wasting computational resources (Harik et al.|]1999b|. 

A common approach to compute the optimal convergence probability of an evo- 
lutionary algorithm (EA) with finite search sets is to model the algorithm using finite 
state Markov chains. However, it is barely possible to obtain analytical expressions 
since the probability transient matrices of these Markov chains are intractable even for 
simple optimization problems. Sometimes assumptions regarding the population size, 
the operators, and the optimization problem help to estimate the optimal convergence 
probability. These assumptions usually reduce the state space and, therefore, the size of 
the probability matrices, even turning these matrices into matrices with special proper- 
ties. 

The idea of how to approach a population-based EA with recombination and se- 
lection but without mutation is introduced in (Harik et al. |1999b I. It is argued that 



the dynamics of such EAs are similar to the dynamics of specific random walks. The 
obtained results are based on many approximations without giving any estimation of 
possible errors. Rudolph ( 2005^ proposes a more solid theoretical foundation upon this 
argument. His work gives a mathematical model to lower bound the optimal conver- 
gence probability of a variation of non-generational EAs, while optimizing the OneMax 
problem. The approach is still based on modeling the EA using random walks on fi- 
nite space, yet he employs some estimations which make the argument not completely 
mathematical sound. Since the cGA mimics the behavior of a binary non-generational 
EA, then one can use Rudolph's idea to bound the optimal convergence probability of 
the cGA. However, even if we build a completely rigorous mathematical foundation 
upon ( Harik et al. |1999b Rudolph| |2005 ), we cannot study the optimal convergence 
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probabihty of the PBIL by the same approach since the PBIL cannot be modeled by a fi- 
nite Markov chain. This motivates us to find a more general approach covering a wider 
range of EAs. 

A broad mathematical framework is considered in ( [Norman] |1972 1 that includes 



stochastic learning models with distance diminishing operators in metric spaces for 
experiments with finite numbers of responses and simple reinforcement. One main 
result of this framework is to define superregular and subregular functions and then 
use them to bound the convergence probability of a learning algorithm to different 



possible desired actions. In ( Lakshmivarahan and Thathachar. 1976), it is shown the 



distance-diminishing property is not necessary and this method can be used in a wider 
range of application. This method is applied successfully to many different adaptive 
systems. See for example ( [Thathachar and Arvind}[T99 8 1 and references herein. 

In this paper, we will use this method to lower bound the optimal convergence 
probability of the cGA and the PBIL. Then we will show that for a specific class of func- 
tions the cGA with sufficiently small learning rate and the PBIL with sufficiently small 
learning rate or large population size converge almost surely to the maximum. Further, 
using the lower bounds, we will derive some upper bounds on the learning rates and 
a lower bound on the population size to make sure that algorithms will converge to 
the optimum with a predefined confidence level. As one will see, the advantage of the 
approach used in this paper is that it helps us to study several properties of the cGA, 
the PBIL, and possibly other types of EAs under the same umbrella. 

This paper is organized as follows: Section |2] describes the cGA and the PBIL pre- 
cisely. Section |3] reviews basic mathematical background relevant for this paper. In Sec- 
tion |4] bounds on the optimal convergence probability are computed for the cGA and 
the PBIL. Lastly, in Section|5| computation is conducted for linear functions and several 
simulations are given. The paper concludes with insights toward future research. 

2 Algorithms 

Let fl — {0,1}" and / : ^ M be a pseudo-boolean function. The goal is to maximize /. 
Assume an EDA represents the probability distribution of the population of individuals 
by a PV p{k) — (pi(fc), ...,p„(fc)) where Pi{k) refers to the probability of obtaining a 
value of 1 in the ith component of the population of individuals in the fcth generation. 
Let define the initial PV as p(l) = p'^ where p^ — (0.5, ...,0.5). 

A simple EDA is the PBIL introduced by ([ Baluja and Caruana[ 1995| . At iteration 



k, drawing the PV, p{k), N individuals are obtained and A of these individuals are 
selected using a selection scheme and named w^^^k), w'^^ (k),..., w'-'^' (fc). These selected 
individuals are then used to modify the PV according to a Hebbian-inspired rule in the 
form of 



1 ^ 

p(/c + l) = (l-a)p(fc) + a- V?i;(*)(fc) (1) 

^ t=i 

where a e (0, 1) is a learning parameter. In this paper, we use two-tournament selection 
A times to find w(*)(fc)s (1 < < < A) as follows. For each I < t < X, two random 
individuals c^^\k) and c'^^\k) are generated on the basis of p{k) and then compete 
with each other and w(*)(fc) = c(i)(fc), = c(2)(fc)(resp. w(*)(fc) = c(2)(fc), Z(*)(/c) = 

c(2)(fc)) when/(c(in't)) > /(c(2)(fc))(resp. fic^^\k)) > f{c^^\k))). Clearly in our case, 

Harik et al. ( 1999a| present the cGA belonging to the EDA family. In this algorithm 



two-tournament selection is used just one time. At the kth iteration of the optimization 
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process, two individuals c'^^^k) and cP\k) are generated on the basis of p{k). Then 
w{k) — w'^^^k) and l{k) — I'-^^k). Thusp(fc) is updated as follows: 

p{k + 1) = p{k) + a{w{k) ~ l{k)) (2) 

To prevent p,s from getting smaller than or larger than 1, we let a be equal to 1/m, 
where m is an even positive integer. The next lemma is useful for our analysis ( [Hohfeld 
and Rudolph} |1997{ Rastegar and Hariri[|2006b) . 



Lemma 1. In 2-tournament selection method, let P{w''^\k) = y) {resp. P{l'^'^'>{k) — y)) be 
the probability of obtaining y as the winner {resp. loser) individual at the kth iteration. Then 

p(w^'\k)=y)=Pu{y)\ E P^^A 

l/W</to) /(^)</(y) J 

P (l^'\k) ^ y) ^ Pu{y)\ E ^'^Wl (4) 

l/(^)>/to) S{z)>f{y) J 

where Pk{y) denotes the probability of sampling the individual y at iteration k. 

It is clear that for a given k, w'^'^s are independent and identically distributed (i.i.d.) 
random vectors and therefore P [w^^^k) = y) = P {w''^^k) = y) for 1 < i, j < A. 

3 Mathematical Preliminary 

In this section, we define (sub,super)regular functions]^ and mention their connection 
to the convergence probability of a stochastic process to an absorbing state by stating 
some results similar to those of (Norman)|1972 Lakshmivarahan and Thathachar}[l976} 



for time-homogeneous Markov processes. 

Suppose {C(^)}fcLi is a Markov process with stationary transition kernel K defined 
on the compact set S c M", i.e. K : Sx a{S) M where (t(S') is the Borel-sigma algebra 
generated by S. Suppose that {'C(fc)}^]^ converges almost surely to some points in 
A — {sq, sat-i} C S. Let C{S) be the space of all continuous functions from S to M. 
Since 5 is compact, every function in C{S) is bounded. Let Ai, A2, Arhe a partition 
of A where for i ^ j, Ai and Aj are noncommunicating classes, meaning the probability 
of going from a point in Ai to a point in Aj is zero. Given an 1 < i < r, define 



\k — >oo 

as the probability that ^{k) converges to some element in Ai provided that the initial 
value of ^(1) is s. 

U : S ^ R, the operator U is defined by 

Ui:{s) = E{i:{^{k+l)Mk)^s} 

for A: > 1. Note that U is linear and preserves non-negative function. Further 

= [/t/^^v^s) = E{yj{ak))m) - s} 

for all A; > 1 and U^tp{s) ~ Utp{s). The following lemma shows that Fyi. (.) {i = l,...,r) 
satisfies a functional equation with appropriate boundary conditions. 

^Another commonly used name in the probability theory is (sub-super)harmonic functions 
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Lemma 2. r^;(.) fs a solution of the functional equation U^p — ip with the boundary condi- 
tions ^{s) — lifs^Ai and -0(s) — if s € Aj, j ^ i. Also, ifh<E C{S) is another solution 
of the equation, then h = TAi- 

Remark. This result holds without the assumption that /i is a continuous fimction. 
Please refer to (Durrett 1995) , Section 5.2, Excercise 2.6 for more information. 



Proof. Clearly F^i. satisfies the boundary conditions. Also, 

- lim / P (e(fc) e A,|e(l) - y) K{s, dy) = lim / K^'^y, A,)K{s, dy) 

k — >oo J g k — *oo J Q 

= lim =r^,(s). 

k — >oo 

Suppose h E C{S) is another solution of the equation. Since /i is a bounded function, 
then for a given s E S, {[/''/i(s)}^ is a sequence of bounded real numbers. Thus by 
Bolzano- Weierstrass Theorem there is a convergent subsequence {U''^ h(s)}^^^. Now 
an application of Bounded Convergence Theorem (Durrett |1995) in (jSjl gives 



h{s) = Uh{s) = ... = U'''h{s) = ...= lim [/'=^7i(s) 

= lim E{hm,)Ml)^s} 

= E I lim h{^{k,))m) = s] (5) 

= E{h{lim akj))m)^^ 



= Eihilim akma) = -'^> (6) 

[ k-^oo J 

= ^s'eAh{s')P{ lim ^{k) = ,s'|e(l) = s) 

k^oo 

= lim ^(fc) = ^ s) = TaM, 

k — ^oo 

which (j6} comes from the fact that the each subsequence of a almost surely convergent 
sequence converges almost surely to the same limit random variable. □ 

Since solving such an equation is a difficult task, an attempt is made to determine 
bounds on r^;(s) (i — 1, ...,r) which satisfy functional inequalities. In this context 
subregular and superregular functions are defined. The function : S M is a 
subregular (resp. superregular) function if and only if Uip{s) > 'ip{s) (resp. UiIj{s) < 
tP{s)) for all s e S. 

Lemma 3.1fip& C(5) is subregular {resp. superregular) with ip{s) — 1 when s <E Ai and 
i/)(s) = when s G Aj, j ^ i, then ip{s) < T^. (s) (resp. ip{s) > F^. (s))/or aZZ s e 5. 

Proof. The proof is similar to that of lennma|2] □ 

Lennma|3]reduces the problem of obtaining bounds on (s) to finding subregular 
and superregular functions with appropriate boundary conditions. No general method 
of identifying superregular and subregular functions is known. One has to start with a 
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promising functional form and evaluate the parameters of the function so that the re- 
quired inequality is satisfied. Finding a promising functional form and the best values 
for its the parameters is the most difficult part of the procedure. The following lemma 
can be usefull to simplify this procedure. 

Lemma 4. Let ipi e C{S) be monotonic increasing subregular functions, then H V'i(-) is a 
s u bregu lar function. 



Proof. The application of the Chebyshev Integral Inequality ( |Tong 1997| implies 

n n 



i=l i=l 

Considering the subregularity of shows 'ip{s) < Utp{s). □ 

Using the above lemma in finding the subregular function leads us to more con- 
servative result, however, it reduces the difficulty of problem. 

4 Optimal Convergence Probability 

In this section, an application of Lemma |3] provides some bounds on the optimal con- 
vergence probability of the cGA and the PBIL for a class of binary functions defined in 
the following. 

Definition (Property 1). A function / : — > M satisfies Property 1 if /(xVe^) > f{xAei) 
for all X G il and 1 < i < n where is the i-th unit vector with dimension of n and its 
binary complement and A and V are component-wise "AND" and "OR", respectively. 

This property essentially states that setting one bit to does not increases the func- 
tion value. All linear functions f{x) = I^^^iJiXi with 7^ > have Property 1. There 
are also some nonlinear functions such as f{x) ~ 2'Sf^ijiXi + HILi having property. 
From this point forward, we assume that / satisfies Property 1 . 

4.1 Lower-bound for the cGA 

The cGA shows a complicated non-linear behaviour. To analyze the optimal conver- 
gence probability of this algorithm we approach the problem as follows. We first prove 
the algorithm will converge to a point in ft. Then, we decompose the problem into 
tractable sub-problems and we compute some bounds on the optimal convergence 
probability for each subproblem by bounding the interaction among the sub-problems. 
Finally, we integrate the partial bounds. 

Let the random sequence {p{k)}'^^i be generated by the cGA while optimizing 
function /. It is clear that this sequence is a time-homogeneous markov chain on 
{0, a, 2a, 1}" with il as the absorbing points and {0, a, 2a, 1}" — as the transient 
states, thus the a.s. convergence of the cGA to a point in is guaranteed. However, 
we will prove this fact using a second approach developed in ( [Hohfeld and Rudolph} 



1997 1, since the latter can be easily used to show the convergence of the PBIL while the 
first approach does not work for the PBIL, and also, the second approach gives some 
insights about the behaviour of each 

Lemma 5. For every 1 < d < n, linifc^oo Pd{k) — p'^ exists and p*^ e {0, 1} almost surely. 



6 



Evolutionary Computation Volume x, Nimiber x 



On the Optimal Convergence Probability of Univariate Estimation of Distribution Algorithms 



Proof. Equation ^ imphes E [pd{k + 1) \p{k)] — Pd{k) + aE [wd{k) — ld{k)\p{k)] for all 
1 < d < n. Since / satisfies Property 1, for a given a; G fi, f{x V Cd) > f{x A e^). Hence 

^f{z)<f{xVea)Pk{z) > ^f(z)<f(xAea))Pk{z) (7) 

y^f(z)<f{xye^)Pk{z) > S/(3)</(^Aerf))Pfc(z) (8) 

^f(z)>f{xye^)Pk{z) < S/(3)>/(^Aerf))Pfc(z) (9) 

^f{z)>f{xVea)Pk{z) < S/(2)>/(2,Aed))^'fc(2:) (10) 

Then based on Lemma [T| we have 

P{w{k)=xVed)/Pk{xVed) = ^ Pk{z) + ^ Pk{z) 

fiz)<f{x\/ed) /(z)</(xVed) 
f{z)<f{xAea) fiz)<f{xAea) 

= P{w{k) = xAed)/Pk{xAed), (H) 

and in a similar way 

PHik) ^xV ed)/Pk{x V ed) < P{l{k) ^xh ed)/Pk{x A e^). (12) 

Define k) = n"=i.j5^dPj C^)"^' ~ -PjC^))^""^' ■ I* is easy to see that Pk{x A e^) = 
(1 — pd{k))qd{x, k) and Pk{x V e^) = Pd{k)qd{x, k). Insertion of these identities into the 
inequalities | |Tl| | and ( [T2| and some simplification show that 

P{w{k)^xy ed) > Pd{k){P{w{k)^xAed)+P{w{k)^x\/ed)) 
P{l{k)=xVed) < Pd{k){P{l{k)^xAed) + P{l{k) = xVed)). 

Thus, the above inequalities conclude 

E{pd{k+l)\pik)}~pdik) = aE{wd{k)-ld{k)\p{k)} 

= aJ2=^d{Piw{k) = x) - Pil{k) ^ x)) 

xefi 

> ^Pd{k) J2 iPiw{k) =xAed) + P{w{k) = x V e^)) 
xeci 

- ^Pd{k) J2 (^(^(^) = a; A ed) + P{l{k) = a; V e^)) 

a;Gf2 

= ^Pd{k) ^ 2P(^(fc) = a:) - |pd(fc) E 2^('(fc) = ^) 

2:6E0 x^Q 

= apd{k) - apd{k) = 0. 

This shows that {pd(fc)}^i is a submartrngale which is positive and imiformly 
bounded by one. Thus Martingale theorem (Durrett 1995 1 asserts that linife^oo Pd{k) = 
p* exists almost surely. If ^ {0, 1}, thenp^(A;) ^ p*J^k+l) with a non-zero probability 
for all k which is a contradiction. Hence p*^ G {0, 1} and {0, 1} forms the absorbing 
states for the Markov process {pd{k)}. This completes the proof. □ 
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We are now in a position to apply the results of the section|3]to find a bound on the 
optimal convergence probability of the cGA. Note {p(fc)}fcLi is a time-homogeneous 
markov chain with the compact state set S — {0,a,2a, converging almost 

surely to A ^ fl. Without loss of generality, we assume that x* = (1, 1) is the 
only maximum point of function /. Let partition A to two sets of the optimal point, 
Ai — {(1, 1)}, and non-optimal points, A2 ~ ft ~ Ai, then the optimal convergence 
probability of the cGA will be Tai ((0.5, 0.5)), the probaility that {p{k)} converges to 

X*. 

The important step is to find an appropriate fimctional form, ^(.) : S* ^ M, s.t. V'( ) 
has the same boundary values as (.), that is tp{p) = 1 for P G ^1 and V'(p) = for 
p G A2. The first candidate for such a functional form is 

1 - er''^d=iPd 
= 1 - e-" ' 

where 6 > is to be chosen. In this case, the best value for b giving a tight lower bound 
is the largest value for which Uip{p) > ip{p) holds, i.e. is a subregular function. 
To compute the largest value of b, we need to have transition probability matrix of 
the markov process {p{k)}^^i- However, this matrix is intractable, even for simple 
optimization functions, and accordingly, we need to find another functional form. One 
way is to first decompose the PV, p{k) = {pi{k), ...,j3„(fc)) to some sub-PVs. Then for 
a given sub-PV, we introduce a subregular function depending only on this sub-PV by 
bounding its interaction with other sub-PVs. The larger sub-PVs' sizes are, the sharper 
result we get, but at the same time, the complexity of the approach increases. Finally, 
we find our subregular function by multiplying the sub-PVs subregular functions. For 
the sake of simplicity in the notation and computation, we will consider the sub-PVs 
with size one i.e. we look at subregular function 

n 

^.)^1[Mp), (13) 



d=l 

with 

1 _ p-bdPd 

^'^P^ = 1 - e-^d (^^^ 

where for each I < d < n, pd is the dth component of p and 5^ > is to be chosen. 
Since V'd(-)s are continuous, then ip{.) e C(5). Again, the best value for bdS are the 
largest values for which i/jdi-) are subregular functions. A direct computation of 5jS in 
inequality Uip{p) > ip{p) is a tedious task, however, finding the bdS for which ijjd are 
subregular is simple. 
Let define 



Hd{k) = p(/(c«(fc))>/(c(2)(fc))|cW(fc) = l,c(')(fc) = 0) 

+ p(/(c(2)(fc))>/(c«(fc))|cf (fc) = l,4^'(fc) = 0). (15) 

Hd (k) is the quantity that models the interactions of Pd{k) with other PV components 
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at iteration k. Using this notation we have 

P{wd{k) - h{k) = l\p{k)) = P {/(c(i)(fc)) > /(c(2)(fc)), c«(fc) = 1, cf{k) = Ob(fc)} 

+ p{/(c(2)(fc)) > /(c(i)(fc)),cW(fc) = Q,cf{k) = Ib(fc)} 

= 2Hd{k)pd{k){\-pd{k))- (16) 

P{wd{k)-ld{k)^Q\p{k)) = P{wd{k)^lMk)^l\p{k)) 

+ P(zi;d(fc) = 0,Zd(fc) = 0|p(fc)) 

= l-2pd(fc)(l-pd(fc)). (17) 



P(«^d(fc)-Z<i(fc) = -lb(fc)) = l-P{wd{k)-ld{k)=Q\p{k)) 

~ P{Wd{k)~ld{k)^\\p{k)) 

= 2(l-i?d(fc))p<i(fc)(l-p<i(fc)). (18) 



Note that 



E{pd{k+l)-pd{k)\p{k)} = aP{wd{k)-ld{k)^l\p{k)) 

- aP {wd{k) ^ ld{k) ^ -\\p{k)) 
= a{2Hd{k)^\)pd{k){\-pd{k)). (19) 

By lemma |5j the left-side hand of 1 19 1 is always non-negative, therefore one has 1 < 
2Hd{k). To exclude that factor of time from the interaction among the sub-PVs, we 
define Hd = rain^ Hd{k) use HdS for finding the subregular functions. 

Lemma 6. Lets define ipd ■ S R as in (14K If Hd ^ 1 then tpd{-) is subregular provided 
that bd < ^In jz^- ^f Hd = 1, then tpdi-) is subregular for all bd > 0. 

Proof. Some computations and using |[T6|-|[T8| give 



uMp)-Mp) - E{MPd{k + i)Mk)^p}^Mp) 

f 1 _ p-bdPd{k+l) ^ 1 _ p-bdPd(k) 



1 - e-'^'i ' \ 1 - e-''rf 



^ g~bdPd(k) _ |g-6dPd(fe)-bdQ(-!i)d(fc)-id(fc))|pj'y^-||~j 



1 - e-'''i 

— ^ ^ ^g-bdPd{k) _ g-fcdPdCc)^; |g-6da("'d(fc)-id(fe)) 

p-bdPd(fe) 

= -Y^^{^~P{Mk)~ld{k) = l\p{k))e-'""^ 

+ P {wd{k) - ld{k) = ~l\p{k)) e"^" + P {wd{k) - ld{k) = Q\p{k))] 

p-bdPdik) 



^Pd{k) (1 - pd{k)) (1 - i?d(fc)e-^^" - (1 - e"^") . 

Hence, i^d{-) is a subregular fimction if 1 > Hd{k)e~'^''" + (1 — Hd{k)) e^''" or equiva- 
lently 

(1 - Hd{k)) e^^''" - e''"" + Hd{k) < 0. (20) 
Evolutionary Computation Volume x, Number x 9 
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If Hd{k) = 1, the inequality trivially holds. Suppose Hd{k) < 1. Since 2Hd{k) > 1, 
solving |[20| shows 



2{l- Ha{k)) 
1 + ^{2Hd{k) - 1)2 _ if,(fc) 



2(l-i?d(fc)) l-i/dW 



(21) 



By inequality |21 1, i/'(.) is subregular if 



1 gd(fc) 1, Hd 
Od < — mm in = —in 



a i</c<oc 1 — Hd{k) a 1 — Hd 
which completes the proof. □ 

The following main theorem is a direct result of the lemmas |4] and |3] 
Theorem 7. Let p° — (0.5, 0.5) be the initial PV and x* be the optimal solution. Then 

nfl+f^T^)") <rA,(p°)-P(limp(fc) = x*b(l)=pO) (22) 



Proof. Let ip{.)he defined as in | |T3| . One sees that since ipdi-)^ are monotonic increasing, 
by Lemma |4[ ?/'(■) is subregular if each ^(j( ) is subregular. Therefore, according to 
lemmas in and |6] we have 



1 — e 2 

d=l d=l 

" I 1 " 

n 7^ ^ ^ n -1 In ^ n ~ 

d=i + e 2 Hd^il + e^" ^-"d d=i I 



which completes the proof. □ 



Remark. A similar result is reported in { Rudolph} 2005| for binary non-generational 



evolutionary algorithm (the cGA) optimizing the OneMax problem. However, there 
are two questionable points in the argument. To understand these points, we review 
the argument. Each component Pd{k) of the probability vector is modeled by a random 
walk on 5* = {0, 1, 2, m} where m = a^^ . Let Pi_i^i{d, k), Pi^i^i{d, k), and Pi^i{d, k) 
be the probabilities that pd(A:+l) = pd{k)+a,pd{k+l) = pd{k) — a, and pd{k+l) — pd{k) 
when Pd{k) = ia. Pi^i+i{d, k), i_i(d, fc), and Pi^i{d, k) form transition probabilities of 
the dth random walk with m + 1 states 0,1, ...,m. 1,...,to— 1 are the transient states and 
and m are the absorbing states of the random walk. Thus we have 

Pi^i{d,k) ^ 1 - 2m(l - ia) , 

Pi^i+i{d, k) = 2ia {I ~ ia) Hd{k) , 

P,,,_i(d,fc) = 2ia{l-ia){l- Hd{k)), VI < i < m 

Po,o(d,fc) = 1, 

Pm,m{d,k) = 1. 
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Cleary, these random walks are state-dependent time-inhomogeneous Markov pro- 
cesses. Replacing the transition probabilities of these random walk with some new 
transition probabilities 

P^,^+l{d,k) = Hd{k), 

Pi4d,k) = 0, Vl<j<m 
Poflid^k) = 1, 

Pm^rnidj fc) 1 

gives n new random walks with the same absorption probability for state and m as 
in the original random walks. The first fallacy arises when the author uses "Equation 
(1)" of the paper derived originally for absorption probability of a time homogeneous 
random walk to obtain the absorption probability of the new random walks, clearly not 
time-homogeneous. At the end, it is also concluded that a lower bound on the proba- 
bility that {p{k)} converges to (1, 1) is the product of lower-bounds on probabilities 
that random walks {pd{k)} converge to 1, however, since P{limk^aoPdik) — lb(l) — 
P°) > P{linik^ooPi{k) = 1, ...,linifc^ooPn(fc) = 1^(1) = p") for each 1 < d < n it 
is not clear how to lower bound P(lim/j^oo = x*\p{l) — p") by lower-boimdrng 
P(limfe_ooPd(A:) = l|p(l)=p°). 

The bound on the optimal convergence probability can be utilized to show that for 
sufficient small a the cGA converges almost surely to the optimal solution of functions 
with Property 1. If > i (this is proven at least for the linear functions in Section 
jsj, then ^^jY^ < 1 for all 1 < d < n. Thus letting a ^ in TheoremjTj completes the 
argument. Since some of the functions with Property 1, such as the OneMax, are not 



injective, this result can be considered a complementary result for (Rastegar and Hariri 
|2006b|. 

Theorem [T] can further be used to determine a conservative range of possible val- 
ues of the learning rate for which the cGA converges to the optimal solution with a 
confidence level < /3 < 1. It is clear that if 

. ln(l- Hd)- In Hd 
< a < mm 



then Theorem [t] concludes (3 < P (limk^oc p{k) — x*\p{l) = p"). This estimate is con- 
servative, and we underestimate the actual range of values for the learning rate. 

4.2 Lower-bound for the PBIL 

In the remainder of this section, we obtain a lower bound for the optimal convergence 
probability of the PBIL. Let the random sequence be generated by the PBIL 

while optimizing /. The state set of the time-homogeneous Markov process 
is the compact set S = [0, 1]". With a similar argument to that of Lemma [Sj we can 
show for a given 1 < d < n, {pd{k)}^^i is a submartrngale, limfe^ooPd(fc) = P*d exists, 
and e {0, 1} almost surely. Therefore the absorbing set of {p{k)}^^^ is fi, i.e. A — il. 
Define Ai and A2 as before. A promising subregular function for computing a bound 
on the optimal probability probability of the PBIL could be JlSl where bd > Os are to be 
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chosen. One shows 

UMp) - Mp) = E{tPd{pd{k + l))\p{k) =p}- Vd(p) 

1 _ p-bdPdik+l) •) 1 _ p-bdPd{k) 

-\p{ky 



= l-\-bd (e"'"^''''' - E |e-^<'P<*('=+i) }) 

/ A N 

-bdPdk) _ g-6d(l-a)p<i(fe) Jj£;|e-'T"'d'W|p(A;)| 



1 - e-''" 



t=i 



\ t=i / 

g-bdPd{k) 

^ 1 - e-''<* ■ 
Since for all i, j, k 

P (wf{k) = l\p{k)) = P (wf{k) = l\p{k)) , 

we define Gd{k) = P (w''^\k) = l\p{k)^. Therefore the most right hand side of above 
expression is 



g-bdPd{k) 



(l - e^'^^P'^'"') (Gd(fc)e^ + 1 - Gd{k)y 



1 - e-^d 
For a given k, lets define 

u{bd, k) = l- e^<^"f<^('=) (Gd{k)e=^ + 1 - (23) 



The fact that Gd{k) = pl{k) + 2pd{k){l - Pd{k))Hd{k) shows that Gd{k) = 1 (resp. 
Gd{k) = 0) if and only if pd{k) = 1 (resp. Pd{k) = 0). In these cases u{bd, k) = for all 
value bd- Assume < Gd{k) < 1 and <pd{k) < 1. For a given k, computing the first 
derivative of u{bd-, k) with respect to bd, we have 

= ae''--('=)(G,(;fc)e-'^+l-G,(fc))'" 

X (Grf(fc)e-^ (1 - pd{k)) - (1 - Gd{k))pdik)) . 

Solving ^"a^'^'*^^ = shows that u{bd, k) has only one critical point at 

_ \ il-pd{k))Gd{k) 
^''^"^ - a''pd{k){l-Gdm 

X^^ Pd{k)+2{l-pd{k))Hd{k) 
a l+pd{k)-2pd{k)Hd{k) ' 
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Substituting b^{k) in |23 1 and simphfying, we have 

l-u(6S(fc),fc) - (pd(fc) + 2(l-prf(fc))ffd(fc))^P''('=) 

X (l+prf(fc)-2prf(fc)ffrf(fc))^(i-^^('=». (24) 



Note that a general form of Arithmetic-Geometric means inequality indicates that 



ClOi + C2O2 



Cl + C2 



where c, and hi are nonnegative. An application of this inequality to the right-hand 
side of I p4) implies it is less than or equal to 



/ Aprf(fc) (prf(fc) + 2(1 - paik))Haik)) + A(l - prf(fc)) (1 + prf(fc) - 2prf(fc)grf(fc)) \ ^ 



meaning that u{b*^{k), k) > 0. Suppose that there is a 5' S (0, b'^{k)) such that u{b' , k) < 
0. Since w(0, fc) = and u{b*^{k),k) > 0, by continuity of u{., k) with respect to bd in 
(0, h*i(ky), there is at least a local minimum (i.e. a critical point) for u(., fc) which is a 
contradiction since 6jJ(fc) is the only critical point of w(., fc). Thus, u{b' , fc) > for all 
b' G (0, b*^{k)). On the other hand, for each d, ipd is subregular if u{bd, k) > for all fc. 
Therefore, V'd is subregular if < 5^^ < inf^ b*^{k). At this point, one needs to compute 
inffe b*^{k). Some computation shows that for a given fc 

a6*(fc) _ 2A 

ai?d(fc) " a (1 + prf(fc) - 2pd{k)Hd{k)) {pd{k) + 2(1 - pd{k))Hd{k)) ^ 
and 

^ (2grf(fc) - 1)^ ^ 

dpd{k) {I + pd{k) - 2pd{k)Hd{k)) {pd{k) + 2{l - pd{k))Hd{k)) - 

Thus fo|J(fc) is an increasing function with respect to Hd{k) and Pd{k), implying that 
b*d{k) attains its minimum value, ^ In 2Hd, when Hd{k) = Hd and Pd{k) 0. Thus, an 
argument similar to that of Theorem |7| shows that by selecting bd = ^ In 2Hd, for each 
1 < d < n, we have 

< r^,(/) = P( lim p{k) = a;*b(l) =/). (25) 

k — >OD 

Letting f ^ shows that for sufficiently small a or large A, the PBIL converges almost 
surely to the optimal solution for functions with Property 1, a complementary result to 
( [Gonzalez et al. 2000| Rastegar and Harin} 2006a}. Again, one computes a conservative 




range of possible values of the ratio of the learning rate and the population size for 
which the PBIL converges to the optimal solution with a confidence level < /3 < 1. 
Some computation shows that if 

„ a . -\TL2Hd 
< -- < mm 



A i<d<n 21n(/3-s - 1) 
then/3 < P(limfc^ecp(fc) = a;* |p(l) = 
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Remark. The maximum value computed for each bd for the cGA is optimal in the sense 



that if bd > ^In fz^/ then 1 20 i does not hold anymore, however, in the PBIL case, 
bd = ^ In 2Hd is not the optimal possible value for bd and one can improve the bounds 
for the optimal convergence probability of the PBIL by finding the maximum value of 
bd for which u{bd, k) > 0. 



Remark. Convergence of the PBIL is first studied in ( Hohfeld and Rudolph 1997| for 



a linear function with maximum point x* . Assuming p{l) G (0, 1)" and a £ (0, 1), it is 
argued that since E {pdik)} is strictly monotonic when < pd{k) < 1 for 1 < d < n and 
E {pd{k)} is bounded above by unity, then pd{k) converges in mean (and also almost 
surely) to x'^. However, it is proven in (Gonzalez et al. , 2001) that for a 2-bit OneMax 
problem, {(pi(fc),p2(fc))}^;^ converges "almost surely" to (0, 0) if a and (pi(l),p2(l)) 
are selected very close to 1 and (0,0), respectively. This counterexample shows that 
the argument in (Hohfeld and Rudolph |1997 i is not correct for all values of a e (0, 1). 



The fallacy lies in assuming that a strictly monotonic sequence tends to x'^ (unproven 
Theorem 2, same paper). 

5 Computation of H^s and Experimental Verification 

Knowing i/^^s for a given function is essential for all of our results. In this section 
we compute HdS for some simple functions. Suppose f{x) ~ Let define 

A{I, k) = T.i^j^i{c[^\k) — Sp{k)) for a subset / C {1, n}. To simplify the notation, 
we assume that 7iS are natural numbers. However, with some adjustment in the nota- 
tions following lemma holds for all positive real 7iS. 
First note that 

2Ud(k) = P{A{{d},k)>^-fd) + P{A{{d},k)<jd) 
= l + P{-jd<A{{d},k)<^d). 

Since Hd{k) is a continuous function on the compact set [0, 1]"^^, it has minimum and 
maximum in [0, 1]"^^. Let p^*'(fc) be a vector obtained by deleting the «-th component 
of p{k). Fix 1 < d < n. It is not hard to see that if, at iteration fco, some components of 
(fco) are in {0, 1} and others are the same as those of p^'^\k), then H^{kQ) > Hd{k). 
Thus, the minimum of Hd{k) is a point q G (0, 1)"^^. Suppose that j^'^^k) G (0, 1)"^^. 
Let Zj{k) = aj{k) — bj{k), then 

P(z,(fc) = -1) = P{z,{k) = 1) = =p,(fc)(l -p,-(fc)). (26) 

Fix j ^ d. Note that, using ([26|, H^{k) can be rewritten as follows 



2Hd{k)~l = P(-7rf < A({4 , fc) < 7d) 
= P{M{d},k)^^) 

1 7d-l 

= E E p{A{{d,j},k) + z,^,^i) 

= PiM{d,j},k) = i)+p,{k)il-p,ik))S{j,k), (27) 

i=-7d 
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where 



7£i-l 
i=-ld 

- 2 Y P{A{{d,j),k)^i). 



Since S{j, k) and P {A{{d, j} ,k) — i) do not depend on pj, the partial derivative of 1 27 
with respect to pj for all j ^ dis 

^Mll^il-2p,)SU,k). (28) 

Obviously ^"f^^ = at g. If S{j, k) = 0, then, by ||27|, p^ does not have any con- 
tribution to the value of Hd and, therefore, we let pj — 0.5. If S{j^ k) ^ 0, then, by 
|28| , Pj — 0.5. This argument shows that Hd{k)s are minimum at the time 1 when 
p(T) = (0.5,. ...0.5), i.e. Hd = Hd{l). 

Lemma 8. Let f{x) — Y^^=i li^i ^ binary linear function with jj > 7i > 0/or 1 < i < 
j < n. Then Hi < Hj. Besides, we have 1 > > ^ + ^ for all n > i > 1. 

Proof. Considering the fact that p{l) = (0.5, 0.5), the above lemma gives 

2H,-1 = P(-7, <^({i},l) <7,) 

= lp{-l^<A{{i,J},^)-^,<J,) 

+ ^Pi-7^< A 

+ \P{-J^<A{{^,J},^) + ^J<J,). (29) 

Since 7^ - < - ji, 

P{-l^ < 1, + A{{i,j} , 1) < 7,) < P(-7, < 7, + ^({z, j} , 1) < 7j). 
In the same way, one argues that 

P(-7. <^({i,j},l) <7.) < P(-7, <^({*,J},1) <7,) 
P(-7^<-7j+^({i,j}a)<70 < P{-lJ<-^^^A{{l,3}^)<^,). 

The combination of these inequalities and I p9| proves Hi < Hj. since < 71, 

2H,-1 = P(-7i < A{{1} , 1) < 71) > P{A{{1} , 1) = 0) = 

and consequently Hi > ^ + □ 

In the following two examples we compute the exact values of Hd for two linear 
problems giving us the opportunity to verify our results by conducting some simula- 
tions. 
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Example 5.1. The OneMax problem is a frequently used fitness function in theory of 
evolutionary algorithms research because of its simplicity. The fitness of an individual 
is equal to the number of bits set to one, i.e. f{x) — X]"=i ■ This is an easy problem for 
UEDAs since there is no isolation or deception. For a fixed d, let A = i^^d*^!^' 

B = i^d'^i^ ' where c'^^ and c^^^ are defined as in the section 2 Above argument 
implies that 2Hi - 1 = < A-B <\) with = 0.5 for all j 7^ d. Therefore one 

sees that A and B have the binomial distribution B{n — 1, ^). This concludes 



n — l — z 



P{A-B = z) = ^ P{A = i)P{B = i + z) 



2 — — n+1 

71 — 1 — Z 



= E Gra) 7 a) 



n—l — i I Tl ^\ / ^\i-\-z / ■^^\n — l — i—z f Tl 1 



i— — n+1 



/ i^2n-2 f 2n — 2 
[2) 



yJT, — 1 + Zy 

Since P{A - B = -1) = P{A - B = l), we have 



Hd-l = i(P(A-B-0) + P(yl-B- 1)) 



2 2 

- 2\ /2n - 2 



n — 1 / \ n 

271- r 



Example 5.2. The BrnVal problem is another used fitness function in theoretical re- 
search. The fitness of an individual is equal to the integer number in decimal base repre- 
sented by the individual, i.e. f{x) = J27=i 2'~^Xi. For a fixed 1 < d < n and a,b e fl, let 

A = Sf^i_,^,2-ic|^) and B = ^.^.^^^'^(^l Since P{A = B\c[l^ = l,c[f'> = 0) = 0, 
then P{A > B\cll^ = l,c''P = 0) = P{A > = 1,4^' = 0). Let t be the largest 

index such that c|^^ ^ 1, = and cf^ = cf^ for all n>j>t + l. Note that n>t>d 
because c^J^ = 1, c'^'^ = 0. Since, for a given j, the coefficient 2^~^ of Xj is larger than 
the sum T.j^l2'-^^ = 2^^^ — 1, f{a) > f{b) if and only if we have t ^ i where i is the 

largest index with c|^^ ^ c['^\ In this case, the values of cji\, c^^\ Cj^\, c'^^ do not 
have any influence on the inequality /(a) > f{b). Thus for d < n 

Hd = \{P{A>B\c^^^ = l,cf =Q) + P{A> B\c}^^ = l,cf = G)) 

n 
z— d 

n n— 1 n 

= n ^(4'^ - f + E ^(4'' = ^ = 0) n p^^f - -f) 

j=d+l i=d+l j-i+i 



P(t=d) P{t=i) 
/ -t \ n — d n 

p(c«=i)p(c(f)=o)= M + y: 



2 ^ 4 \ 2 J 2 2"-'^+! ' 

„ " , ' t=d+l ^ ' 

P(t=n) 
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and for d — n, Hd = I. 

In general, when n is large enough, an approximation of Hd for f{x) = J^^^i li^i 
with 7i > can be computed as follows. Let define Fd{x,k) = J2i^d'yi^ii^)- 
Central Limit Theorem ( [Durrett 1995| implies that Fd{x,k) converges in distribu- 



tion to the normal distribution N [Md (k) , (fc)) where Md{k) = '}2i^dPi{^)^i ^^'^ 
Sd(fc) = Y.^^dP^ik){l - p^{k))-^l SmceAp = Fd{w{k),k) - Fd{l{k),k)\xeis distribution 
7V(0,2E2 (fc)), 

1 1 P"^ 

Hd{k)^- + -j^ 7V(0,2S2(fc))dA^. (30) 

Obviously, Hd{k) will be minimum when is maximum. By Arithmetic-Geometric 
means inequality, one sees 



Sd(fc) = > Pi(fc)(l -p,(/c))7, < > 7, > - 



i^d i^d 

Thus l[30) gives 




-Id 



where $(.) is standard normal accumulation function. 

The remainder of this section verifies the theoretical bounds on the optimal con- 
vergence probability of UEDAs. The experiments reported in this section are for the 
OneMax problem. All the results are the average over 1000 independent runs of the 
algorithms. For the cGA, each run was terminated when the PV had converged com- 
pletely, however, for the PBIL, since the PV doesnot converge in a finite time, each 
run was terminated whenever for each 1 < i < n, pi < lO^^orp^ > 1 — 10^^. We 
report the percentage of runs that converged to the optimal solution. The theoretical 
lower-bounds of the cGA and the PBIL are computed using I p2| and pS) , respectively. 

In Figures l[lj and ^ the bold lines are the theoretical lowerbound and the dotted 
lines are the experimental results for the cGA and the PBIL, respectively, while maxi- 
mizing a 5-bit and 100-bit OneMax problems. As it clear in the pictures, in the case of 
OneMax problem, the obtained lowerbound for the cGA are sharper in comparison to 
the lowerbound of the PBIL. One main reason for this difference is related to optimality 
of the computed b for the cGA. Please refer to the first remark in the section ( |4.2| for 
details. Also, simulation shows that lowerbounds obtained in this paper are in general 
sharper for OneMax problem in comparison to the bounds for BinVal problem (com- 
pare the pictures Q, and ||3} for an example). The main reason of this difference is 
that contribution of off all bits in OneMax problem is same and so considering one-bit 
subproblems in the process of finding the lowerbound is a reasonable decision, how- 
ever, for the BinVal, the contribution of different bits are very different and by devidrng 
the problem to one-bit problems we lose lots of information about the dynamic of algo- 
rithm. The author believe that the bounds will be considerably Improve if we use 2-bit 
subproblems. 

6 Conclusion 

The UEDAs are very simple and can be easily implemented in hardware. Using a small 
amount of memory, they may have many applications in the memory constraint prob- 
lems. In addition, theoretical studying of these algorithms are very helpful to develop 
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Figure 1 : Experimental and theoretical results of the optimal convergence probability 
of the cGA on a 5-bit and 100-bit OneMax problems. The theoretical lower-bound is the 
bold line and the experimental result is the dotted line. 



methods needed for the analysis of more complicated ED As. This paper gives new 
theoretical results on the cGA and the PBIL, two of these kind of algorithms, which use 
probability distributions without dependencies between different components. The 
first part of the paper describes a derivation of lowerbounds on the probability with 
which the cGA and the PBIL converge to the optimal solution. The approach closely 
follows a general approach proposed by ( No rman} 1972) with several potential appli- 
cations to the theory of evolutionary algorithms. Bounds are utilized to prove that the 
cGA and the PBIL converge almost surely to optimal solutions of functions with Prop- 
erty 1, as the learning rate (resp. population size) tends to zero (resp. infinity). Exact 
values of if^s are computed for the OneMax and the BinVal problems, and an approx- 
imation is given for HdS of linear functions when the size of problems is sufficiently 
large. 

There are several natural extensions of the results here. The first extention is to 
compute HdS for nonlinear functions satisfying Property 1. Since Property 1 considers 
only 1-bit building block, another extension would be to consider other building block 
sizes. This perhaps improve the bounds especially for the BinVal. Finding an appropri- 
ate form of super-regular function also can be used to find upper bounds. Having up- 
perbound gives us a better picture of the behaviour of the algorithms and the average 
of upperbounds and lowerbounds could be a better estimate for optimal convergence 
probability of the algorithms. 
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Figure 2: Experimental and theoretical results of the optimal convergence probability 
of the PBIL with A = 5 on a 5-bit and 100-bit OneMax problems. The theoretical lower- 
bound is the bold line and the experimental result is the dotted line. 
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Figure 3: Experimental and theoretical results of the optimal convergence probability 
of the cGA and the PBIL on a 5-bit BinVal problem. The theoretical lower-bound is the 
bold line and the experimental result is the dotted line. 
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